Concurrent Algorithms and Data Structures for Many-Core Processors

نویسنده

  • Daniel Cederman
چکیده

The convergence of highly parallel many-core graphics processors with conventional multi-core processors is becoming a reality. To allow algorithms and data structures to scale efficiently on these new platforms, several important factors needs to be considered. (i) The algorithmic design needs to utilize the inherent parallelism of the problem at hand. Sorting, which is one of the classic computing components in computer science, has a high degree of inherent parallelism. In this thesis we present the first efficient design of Quicksort for graphics processors and show that it performs well in comparison with other available sorting methods. (ii) The work needs to be distributed efficiently across the available processing units. We present an evaluation of a set of dynamic load balancing schemes for graphics processors, comparing blocking methods with non-blocking. (iii) The required synchronization needs to be efficient, composable and easy to use. We present a methodology to easily compose the two most common operations provided by a data structure – the insertion and deletion of elements. By exploiting a common construction found in most non-blocking data structures, we created a move operation that can atomically move elements between different types of non-blocking data structures, without requiring a specific design for each coupling. We also present, to the best of our knowledge, the first application of software transactional memory to graphics processors. Two different STM designs, one blocking and one obstruction-free, were evaluated on the task of implementing different types of common concurrent data structures on a graphics processor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Concurrent Data Structures in Architectures with Limited Shared Memory Support

The Single-chip Cloud Computer (SCC) is an experimental multicore processor created by Intel Labs for the many-core research community, to study many-core processors, their programmability and scalability in connection to communication models. It is based on a distributed memory architecture that combines fast-access, small onchip memory with large off-chip private and shared memory. Additional...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Understanding the Performance of Concurrent Data Structures on Graphics Processors

In this paper we revisit the design of concurrent data structures – specifically queues – and examine their performance portability with regard to the move from conventional CPUs to graphics processors. We have looked at both lock-based and lock-free algorithms and have, for comparison, implemented and optimized the same algorithms on both graphics processors and multi-core CPUs. Particular int...

متن کامل

Scal: A Benchmarking Suite for Concurrent Data Structures

Concurrent data structures such as concurrent queues, stacks, and pools are widely used for concurrent programming of shared-memory multiprocessor and multicore machines. The key challenge is to develop data structures that are not only fast on a given machine but whose performance scales, ideally linearly, with the number of threads, cores, and processors on even bigger machines. Part of that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011